Skip to content

Optimize: core slice binary_search_by #141097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RoDmitry
Copy link

@RoDmitry RoDmitry commented May 16, 2025

I have found a more optimal algorithm for the binary search. I have ran benchmarks and it works faster than the previous one. Also I have looked at the compiled assembly, with and without an early return, and it looks good either way. And benchmarks show that it's faster with an early return. Looks strange for me that "CPU can reliably predict the loop count" based on the mathematical operation as it was before (substraction). But I don't know much about branch prediction, so feel free to improve it if possible.

Also I would like to know, is there any attitude towards optimizations using SIMD? Can I write platform specific SIMD intrinsics, or what should I use if I would like to contribute SIMD code optimizations? Is there already any SIMD code optimizations in core/std, I could look at as an example?

P.S. It's my first contribution here.

@rustbot
Copy link
Collaborator

rustbot commented May 16, 2025

r? @thomcc

rustbot has assigned @thomcc.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 16, 2025
@rust-log-analyzer

This comment has been minimized.

@RoDmitry RoDmitry force-pushed the optimize_slice_bsb branch from 512dda9 to d2c7ac8 Compare May 16, 2025 17:22
@rust-log-analyzer

This comment has been minimized.

@the8472
Copy link
Member

the8472 commented May 16, 2025

There have been many changes and attempts to binary_search.

Some versions of this look quite like yours, so you'll have to justify this change by a more detailed analysis and benchmark results. std already has some that you can try to run.
Which implementation performs best also depends on the exact types, the comparison function and the distribution of the values being used for benchmarking.

Also I would like to know, is there any attitude towards optimizations using SIMD? Can I write platform specific SIMD intrinsics, or what should I use if I would like to contribute SIMD code optimizations? Is there already any SIMD code optimizations in core/std, I could look at as an example?

That's possible, but only worthwhile in a few cases where good results can be achieved with the baseline SIMD instructions available on tier1 platforms, e.g. SSE2 on x86-64-linux, since runtime detection isn't available in core.
So we only have a few of those and mostly rely on autovectorization. But you can look at #103779 for an example.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 16, 2025

I have ran benchmarks and it works faster than the previous one.

Which benchmarks? Do they happen to look up the same value in the same slice repeatedly? Or otherwise have predictable patterns in the outcome of the match cmp that this change introduces? I wouldn’t be surprised if this change will look like a performance improvement for such benchmarks, but that’s not representative for many real world uses of binary search.

@RoDmitry RoDmitry force-pushed the optimize_slice_bsb branch from d2c7ac8 to def517a Compare May 16, 2025 18:09
@rust-log-analyzer

This comment has been minimized.

@RoDmitry
Copy link
Author

I agree that you need a more detailed analysis and benchmark results. Have found benchmarks in library/coretests/benches/slice.rs. Will try them later.

@RoDmitry RoDmitry force-pushed the optimize_slice_bsb branch 2 times, most recently from d27f18f to 08464b5 Compare May 16, 2025 23:51
@rust-log-analyzer

This comment has been minimized.

@RoDmitry RoDmitry force-pushed the optimize_slice_bsb branch from 08464b5 to b005d21 Compare May 17, 2025 00:16
@rust-log-analyzer

This comment has been minimized.

@RoDmitry RoDmitry force-pushed the optimize_slice_bsb branch from b005d21 to 477911e Compare May 17, 2025 18:22
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check-tidy failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
info: removing rustup binaries
info: rustup is uninstalled
##[group]Image checksum input
mingw-check-tidy
# We use the ghcr base image because ghcr doesn't have a rate limit
# and the mingw-check-tidy job doesn't cache docker images in CI.
FROM ghcr.io/rust-lang/ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
  g++ \
---

COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
COPY host-x86_64/mingw-check/validate-error-codes.sh /scripts/

# NOTE: intentionally uses python2 for x.py so we can test it still works.
# validate-toolstate only runs in our CI, so it's ok for it to only support python3.
ENV SCRIPT TIDY_PRINT_DIFF=1 python2.7 ../x.py test \
           --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile --allow-unsafe --generate-hashes reuse-requirements.in
---
#12 2.875 Building wheels for collected packages: reuse
#12 2.876   Building wheel for reuse (pyproject.toml): started
#12 3.088   Building wheel for reuse (pyproject.toml): finished with status 'done'
#12 3.089   Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132719 sha256=d2a2565e7037ad3883fb9337653f2e25bbb588534fbef3697286cbc26d1bf634
#12 3.089   Stored in directory: /tmp/pip-ephem-wheel-cache-svz1005y/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#12 3.091 Successfully built reuse
#12 3.092 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#12 3.498 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#12 3.498 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#12 4.043 Collecting virtualenv
#12 4.091   Downloading virtualenv-20.31.2-py3-none-any.whl (6.1 MB)
#12 4.271      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.1/6.1 MB 34.0 MB/s eta 0:00:00
#12 4.333 Collecting platformdirs<5,>=3.9.1
#12 4.340   Downloading platformdirs-4.3.8-py3-none-any.whl (18 kB)
#12 4.362 Collecting distlib<1,>=0.3.7
#12 4.369   Downloading distlib-0.3.9-py2.py3-none-any.whl (468 kB)
#12 4.379      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 469.0/469.0 KB 53.3 MB/s eta 0:00:00
#12 4.419 Collecting filelock<4,>=3.12.2
#12 4.426   Downloading filelock-3.18.0-py3-none-any.whl (16 kB)
#12 4.508 Installing collected packages: distlib, platformdirs, filelock, virtualenv
#12 4.702 Successfully installed distlib-0.3.9 filelock-3.18.0 platformdirs-4.3.8 virtualenv-20.31.2
#12 4.703 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#12 DONE 4.8s

#13 [7/8] COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
#13 DONE 0.0s
---
DirectMap4k:      147392 kB
DirectMap2M:     7192576 kB
DirectMap1G:    11534336 kB
##[endgroup]
Executing TIDY_PRINT_DIFF=1 python2.7 ../x.py test            --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
+ TIDY_PRINT_DIFF=1 python2.7 ../x.py test --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
##[group]Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.05s
##[endgroup]
WARN: currently no CI rustc builds have rustc debug assertions enabled. Please either set `rust.debug-assertions` to `false` if you want to use download CI rustc or set `rust.download-rustc` to `false`.
downloading https://static.rust-lang.org/dist/2025-05-12/rustfmt-nightly-x86_64-unknown-linux-gnu.tar.xz
---
##[endgroup]
[TIMING] core::build_steps::tool::ToolBuild { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu, tool: "tidy", path: "src/tools/tidy", mode: ToolBootstrap, source_type: InTree, extra_features: [], allow_features: "", cargo_args: [], artifact_kind: Binary } -- 35.752
[TIMING] core::build_steps::tool::Tidy { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu } -- 0.000
fmt check
Diff in /checkout/library/core/src/slice/mod.rs:2901:
                     // SAFETY: same as the `get_unchecked` above.
                     unsafe { hint::assert_unchecked(mid < self.len()) };
                     return Ok(mid);
-                },
+                }
                 Ordering::Greater => base,
             };
             /* if cmp == Equal {
fmt: checked 6011 files
Build completed unsuccessfully in 0:01:16
  local time: Sat May 17 18:32:50 UTC 2025
  network time: Sat, 17 May 2025 18:32:50 GMT
##[error]Process completed with exit code 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants